Distant microphone speech recognition in a noisy indoor environment: combining soft missing data and speech fragment decoding

نویسندگان

  • Ning Ma
  • Jon Barker
  • Heidi Christensen
  • Phil D. Green
چکیده

This paper examines the problem of distant microphone speech recognition in noisy indoor home environments. The noise background can be roughly characterised in terms of a slowly varying noise floor in which there are embedded a mixture of energetic but unpredictable acoustic events. Our solution to the problem combines two complementary techniques. First, a soft missing data mask is formed which estimates the degree to which energetic acoustic events are masked by the noise floor. This step relies on a simple adaptive noise model. Second, a fragment decoding system attempts to interpret the energetic regions that are not accounted for by the noise floor model. This component uses models of the target speech to decide whether fragments (time-frequency regions dominated by a single sound source) should be included in the target speech stream or not. This combined approach is able to achieve a performance that is modestly superior to that achieved using speech fragment decoding without an adaptive noise floor. Our experiments also show that speech fragment decoding performs far better than soft missing data decoding in variable noise, achieving 73% keyword recognition accuracy at -6 dB SNR on the Grid corpus task and substantially outperforming multicondition training.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources

This paper addresses the problem of speech recognition in reverberant multisource noise conditions using distant binaural microphones. Our scheme employs a two-stage fragment decoding approach inspired by Bregman’s account of auditory scene analysis, in which innate primitive grouping ‘rules’ are balanced by the role of learnt schema-driven processes. First, the acoustic mixture is split into l...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

Soft harmonic masks for recognising speech in the presence of a competing speaker

The paper addresses the problem of recognising speech in the presence of a competing speaker. It uses a two stage ‘Speech Fragment Decoding’ system. The system works by first segmenting a spectro-temporal representation of the mixture into a number of fragments, such that each fragment is dominated by a single source. An ASR search is then extended to find the combination of speech model sequen...

متن کامل

Adaptive beamforming and soft missing data decoding for robust speech recognition in reverberant environments

This paper presents a novel approach to combine microphone array processing and robust speech recognition for reverberant multi-speaker environments. Spatial cues are extracted from a microphone array and automatically clustered to estimate localization masks in the time-frequency domain. The localization masks are then used to blindly design adaptive filters in order to enhance the source sign...

متن کامل

Combining noise compensation and missing-feature decoding for large vocabulary speech recognition in noise

In this paper we propose a combination of noise compensation and missing-feature decoding for large-vocabulary speech recognition in noisy environments. Two approaches for noise compensation have been studied. These are noise training and vector Taylor series expansion, aiming to compensate white Gaussian noise at various levels. This is followed by subband missing-feature decoding to reduce th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010